37 research outputs found
Application Performance Modeling via Tensor Completion
Performance tuning, software/hardware co-design, and job scheduling are among
the many tasks that rely on models to predict application performance. We
propose and evaluate low-rank tensor decomposition for modeling application
performance. We discretize the input and configuration domains of an
application using regular grids. Application execution times mapped within
grid-cells are averaged and represented by tensor elements. We show that
low-rank canonical-polyadic (CP) tensor decomposition is effective in
approximating these tensors. We further show that this decomposition enables
accurate extrapolation of unobserved regions of an application's parameter
space. We then employ tensor completion to optimize a CP decomposition given a
sparse set of observed execution times. We consider alternative
piecewise/grid-based models and supervised learning models for six applications
and demonstrate that CP decomposition optimized using tensor completion offers
higher prediction accuracy and memory-efficiency for high-dimensional
performance modeling
Communication lower bounds for nested bilinear algorithms
We develop lower bounds on communication in the memory hierarchy or between
processors for nested bilinear algorithms, such as Strassen's algorithm for
matrix multiplication. We build on a previous framework that establishes
communication lower bounds by use of the rank expansion, or the minimum rank of
any fixed size subset of columns of a matrix, for each of the three matrices
encoding the bilinear algorithm. This framework provides lower bounds for any
way of computing a bilinear algorithm, which encompasses a larger space of
algorithms than by fixing a particular dependency graph. Nested bilinear
algorithms include fast recursive algorithms for convolution, matrix
multiplication, and contraction of tensors with symmetry. Two bilinear
algorithms can be nested by taking Kronecker products between their encoding
matrices. Our main result is a lower bound on the rank expansion of a matrix
constructed by a Kronecker product derived from lower bounds on the rank
expansion of the Kronecker product's operands. To prove this bound, we map a
subset of columns from a submatrix to a 2D grid, collapse them into a dense
grid, expand the grid, and use the size of the expanded grid to bound the
number of linearly independent columns of the submatrix. We apply the rank
expansion lower bounds to obtain novel communication lower bounds for nested
Toom-Cook convolution, Strassen's algorithm, and fast algorithms for partially
symmetric contractions.Comment: 37 pages, 5 figures, 1 table. Update includes log-log convex/concave
functions to fix previous bug in v